Cross-validation for selecting a model selection procedure

نویسندگان

  • Yuhong Yang
  • Yongli Zhang
چکیده

While there are variousmodel selectionmethods, an unanswered but important question is how to select one of them for data at hand. The difficulty is due to that the targeted behaviors of the model selection procedures depend heavily on uncheckable or difficult-to-check assumptions on the data generating process. Fortunately, cross-validation (CV) provides a general tool to solve this problem. In this work, results are provided on how to apply CV to consistently choose the best method, yielding new insights and guidance for potentially vast amount of application. In addition, we address several seemingly widely spread misconceptions on CV. © 2015 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymptotic optimality of full cross-validation for selecting linear regression models

For the problem of model selection, full cross-validation has been proposed as alternative criterion to the traditional cross-validation, particularly in cases where the latter one is not well deened. To justify the use of the new proposal we show that under some conditions, both criteria share the same asymptotic optimality property when selecting among linear regression models.

متن کامل

Cross-validation pitfalls when selecting and assessing regression and classification models

BACKGROUND We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. ...

متن کامل

Linear Model Selection by Cross-Validation

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].. American Statistical Association is collaborating with JS...

متن کامل

Estimator selection in the Gaussian setting

We consider the problem of estimating the mean f of a Gaussian vector Y with independent components of common unknown variance σ. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection F of estimators of f based on Y and, with the same data Y , aim at selecting an estimator among F with the smallest Euclidean risk. N...

متن کامل

A general, prediction error-based criterion for selecting model complexity for high-dimensional survival models.

When fitting predictive survival models to high-dimensional data, an adequate criterion for selecting model complexity is needed to avoid overfitting. The complexity parameter is typically selected by the predictive partial log-likelihood (PLL) estimated via cross-validation. As an alternative criterion, we propose a relative version of the integrated prediction error curve (IPEC), which can be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015